TL;DR: Erik's articles perform poorly when compared to the rest of our articles published in the same time period.

  • While he is responsible for the largest outlier in this period, when outlier values are eliminated from each dataset, Erik-written articles receive fewer Page Views or Facebook Shares than the rest of the content.

  • While both Erik and All contents' 1st quartile are quite similar, Erik's articles' performance are clustered at the lower-end of the traffic scale.


In [102]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [103]:
df_all = pd.read_csv('All content.csv')

In [104]:
df_erik = pd.read_csv('Erik content.csv')

Here we eliminate any article published more recently than last Friday or any article published before Erik joined the company. This is done in order to allow a fair, apples-to-apples comparison. We also remove any article authored by Erik from the "All" dataset.


In [105]:
df_all = df_all[(df_all.Published > df_erik.at[6,'Published']) & 
                (df_all['Url'].str.contains('/articles/')) & (df_all.Published < df_erik.at[16,'Published']) & 
                (df_all.Title[~df_all.Title.isin(df_erik.Title)])]
df_erik = df_erik[(df_erik.Published < df_erik.at[16,'Published'])]

In [106]:
print 'All PVs median'
print df_all['Page Views'].median()
print 'Erik PVs median'
print df_erik['Page Views'].median()


All PVs median
4302.0
Erik PVs median
3513.5

Above we see that, at first glance, the median Erik article receives about 800 fewer Page Views than the rest of our data set.


In [107]:
df_all['Page Views'].describe()


Out[107]:
count      154.000000
mean      8386.798701
std      12384.205727
min          1.000000
25%       1985.500000
50%       4302.000000
75%      10039.250000
max      83813.000000
Name: Page Views, dtype: float64

In [108]:
df_erik['Page Views'].describe()


Out[108]:
count        28.000000
mean      14936.714286
std       44782.268583
min           2.000000
25%        1946.750000
50%        3513.500000
75%        7900.500000
max      238771.000000
Name: Page Views, dtype: float64

Here we plot the Page Views received per article while eliminating outliers (both high and low).


In [110]:
d = {'erik':df_erik['Page Views'],'all':df_all['Page Views']}
df = pd.DataFrame(data=d)
df.plot(kind='box',showfliers=False, title = "Page Views Distribution")


Out[110]:
<matplotlib.axes._subplots.AxesSubplot at 0x121438a90>

And here we plot the Facebook Shares per article, also eliminating outliers.


In [111]:
shares = {'erik shares':df_erik['Facebook Shares'],'all_shares':df_all['Facebook Shares']}
df_shares = pd.DataFrame(data=shares)
df_shares.plot(kind='box',showfliers=False, title = "Facebook Shares Distribution")


Out[111]:
<matplotlib.axes._subplots.AxesSubplot at 0x1200b2450>

Finally, we plot both the Page Views and Facebook Shares with outliers included (indicated by the "+" symbols above the respective box plots).


In [112]:
d = {'erik':df_erik['Page Views'],'all':df_all['Page Views']}
df = pd.DataFrame(data=d)
df.plot(kind='box',showfliers=True, title = "Page Views Distribution")


Out[112]:
<matplotlib.axes._subplots.AxesSubplot at 0x12004bd50>

In [113]:
shares = {'erik shares':df_erik['Facebook Shares'],'all_shares':df_all['Facebook Shares']}
df_shares = pd.DataFrame(data=shares)
df_shares.plot(kind='box',showfliers=True, title = "Facebook Shares Distribution")


Out[113]:
<matplotlib.axes._subplots.AxesSubplot at 0x12034b1d0>

In [ ]: